Video segmentation consists of a frame-by-frame selection process of meaningful areas related to foreground moving objects. Some applications include traffic monitoring, human tracking, action recognition, efficient video surveillance, and anomaly detection. In these applications, it is not rare to face challenges such as abrupt changes in weather conditions, illumination issues, shadows, subtle dynamic background motions, and also camouflage effects. In this work, we address such shortcomings by proposing a novel deep learning video segmentation approach that incorporates residual information into the foreground detection learning process. The main goal is to provide a method capable of generating an accurate foreground detection given a grayscale video. Experiments conducted on the Change Detection 2014 and on the private dataset PetrobrasROUTES from Petrobras support the effectiveness of the proposed approach concerning some state-of-the-art video segmentation techniques, with overall F-measures of $\mathbf{0.9535}$ and $\mathbf{0.9636}$ in the Change Detection 2014 and PetrobrasROUTES datasets, respectively. Such a result places the proposed technique amongst the top 3 state-of-the-art video segmentation methods, besides comprising approximately seven times less parameters than its top one counterpart.
translated by 谷歌翻译
Scene change detection is an image processing problem related to partitioning pixels of a digital image into foreground and background regions. Mostly, visual knowledge-based computer intelligent systems, like traffic monitoring, video surveillance, and anomaly detection, need to use change detection techniques. Amongst the most prominent detection methods, there are the learning-based ones, which besides sharing similar training and testing protocols, differ from each other in terms of their architecture design strategies. Such architecture design directly impacts on the quality of the detection results, and also in the device resources capacity, like memory. In this work, we propose a novel Multiscale Cascade Residual Convolutional Neural Network that integrates multiscale processing strategy through a Residual Processing Module, with a Segmentation Convolutional Neural Network. Experiments conducted on two different datasets support the effectiveness of the proposed approach, achieving average overall $\boldsymbol{F\text{-}measure}$ results of $\boldsymbol{0.9622}$ and $\boldsymbol{0.9664}$ over Change Detection 2014 and PetrobrasROUTES datasets respectively, besides comprising approximately eight times fewer parameters. Such obtained results place the proposed technique amongst the top four state-of-the-art scene change detection methods.
translated by 谷歌翻译
Research on remote sensing image classification significantly impacts essential human routine tasks such as urban planning and agriculture. Nowadays, the rapid advance in technology and the availability of many high-quality remote sensing images create a demand for reliable automation methods. The current paper proposes two novel deep learning-based architectures for image classification purposes, i.e., the Discriminant Deep Image Prior Network and the Discriminant Deep Image Prior Network+, which combine Deep Image Prior and Triplet Networks learning strategies. Experiments conducted over three well-known public remote sensing image datasets achieved state-of-the-art results, evidencing the effectiveness of using deep image priors for remote sensing image classification.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
人类机器人相互作用(HRI)对于在日常生活中广泛使用机器人至关重要。机器人最终将能够通过有效的社会互动来履行人类文明的各种职责。创建直接且易于理解的界面,以与机器人开始在个人工作区中扩散时与机器人互动至关重要。通常,与模拟机器人的交互显示在屏幕上。虚拟现实(VR)是一个更具吸引力的替代方法,它为视觉提示提供了更像现实世界中看到的线索。在这项研究中,我们介绍了Jubileo,这是一种机器人的动画面孔,并使用人类机器人社会互动领域的各种研究和应用开发工具。Jubileo Project不仅提供功能齐全的开源物理机器人。它还提供了一个全面的框架,可以通过VR接口进行操作,从而为HRI应用程序测试带来沉浸式环境,并明显更好地部署速度。
translated by 谷歌翻译
在整个计算科学中,越来越需要利用原始计算马力的持续改进,通过对蛮力的尺度锻炼的尺度增加,以增加网状元素数量的增加。例如,如果不考虑分子水平的相互作用,就不可能对纳米多孔介质的转运进行定量预测,即从紧密的页岩地层提取至关重要的碳氢化合物。同样,惯性限制融合模拟依赖于数值扩散来模拟分子效应,例如非本地转运和混合,而无需真正考虑分子相互作用。考虑到这两个不同的应用程序,我们开发了一种新颖的功能,该功能使用主动学习方法来优化局部细尺度模拟的使用来告知粗尺度流体动力学。我们的方法解决了三个挑战:预测连续性粗尺度轨迹,以推测执行新的精细分子动力学计算,动态地更新细度计算中的粗尺度,并量化神经网络模型中的不确定性。
translated by 谷歌翻译
自动生物医学图像分析的领域至关重要地取决于算法验证的可靠和有意义的性能指标。但是,当前的度量使用通常是不明智的,并且不能反映基本的域名。在这里,我们提出了一个全面的框架,该框架指导研究人员以问题意识的方式选择绩效指标。具体而言,我们专注于生物医学图像分析问题,这些问题可以解释为图像,对象或像素级别的分类任务。该框架首先编译域兴趣 - 目标结构 - ,数据集和算法与输出问题相关的属性的属性与问题指纹相关,同时还将其映射到适当的问题类别,即图像级分类,语义分段,实例,实例细分或对象检测。然后,它指导用户选择和应用一组适当的验证指标的过程,同时使他们意识到与个人选择相关的潜在陷阱。在本文中,我们描述了指标重新加载推荐框架的当前状态,目的是从图像分析社区获得建设性的反馈。当前版本是在由60多个图像分析专家的国际联盟中开发的,将在社区驱动的优化之后公开作为用户友好的工具包提供。
translated by 谷歌翻译
通常,基于生物谱系的控制系统可能不依赖于各个预期行为或合作适当运行。相反,这种系统应该了解未经授权的访问尝试的恶意程序。文献中提供的一些作品建议通过步态识别方法来解决问题。这些方法旨在通过内在的可察觉功能来识别人类,尽管穿着衣服或配件。虽然该问题表示相对长时间的挑战,但是为处理问题的大多数技术存在与特征提取和低分类率相关的几个缺点,以及其他问题。然而,最近的深度学习方法是一种强大的一组工具,可以处理几乎任何图像和计算机视觉相关问题,为步态识别提供最重要的结果。因此,这项工作提供了通过步态认可的关于生物识别检测的最近作品的调查汇编,重点是深入学习方法,强调他们的益处,暴露出弱点。此外,它还呈现用于解决相关约束的数据集,方法和体系结构的分类和表征描述。
translated by 谷歌翻译
机器学习(ML)涵盖的实验必须考虑评估模型性能的两个重要方面:数据集和算法。需要强大的基准来评估最佳分类器。为此,可以采用公共存储库中提供的金标准基准。但是,常常不考虑在评估时考虑数据集的复杂性。这项工作提出了一种基于物品响应理论(IRT)和GLICKO-2的组合的新评估方法,该方法通常采用了评估参与者的强度(例如,国际象棋)。对于基准测试中的每个数据集,IRT用于估计分类器的能力,良好的分类器对最困难的测试实例具有良好的预测。然后为每对分类器运行锦标赛,以便GLICKO-2更新每个分类器等额定值,评级偏差和波动等性能信息。在此进行了一个案例研究,该研究通过了OpenML-CC18基准作为数据集的集合和各种分类算法的池进行评估。并非所有数据集都被观察到对评估算法非常有用,其中只有10%被认为是非常困难的。此外,验证了仅包含50%的OpenML-CC18的50%的子集的存在,其同样有用于算法评估。关于算法,本文提出的方法将随机林识别为具有最佳天生能力的算法。
translated by 谷歌翻译
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译